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We study different ways of determining the mean distance (r n ) between a reference point and its 
n-th neighbour among random points distributed with uniform density in a D-dimensional Euclidean 
space. First we present a heuristic method; though this method provides only a crude mathematical 
result, it shows a simple way of estimating (r n ). Next we describe two alternative means of deriving 
the exact expression of (r n ): we review the method using absolute probability and develop an 
alternative method using conditional probability. Finally we obtain an approximation to (r n ) from 
the mean volume between the reference point and its n-th neighbour and compare it with the 
f> |' heuristic and exact results. 
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p^ 1 I. INTRODUCTION 

Consider random geometrical points, i.e., points with uncorrelated positions, distributed uniformly in a D- 
dimensional Euclidean space, with a density of N points per unit volume. A point is said to be the n-th neighbour of 
another (the reference point) if there are exactly n — 1 other points that are closer to the latter than the former. We 
address the following problem: what is the mean distance (r n ) between a given reference point and its n-th neigh- 
bour, n < N? This is essentially a problem of geometrical interest. However the quantity (r n ) is relevant in certain 
physical and computational contexts: for example, in astrophysics it gives the mean distance between neighbouring 
stars distributed independently in a homogeneous model of the universe Q. In optimization theory the values of (r n ) 
help in estimating the optimal length of a closed path connecting a given set of points in space, as in the case of 
the travelling salesman problem SIEEl- It may also help in determining the statistical properties of complex 
fNl 1 networks Q. 

In this article we study different ways of determining the mean n-th neighbour distance. We first present a heuristic 
^ ■ method of estimating (r n ) by using a physical picture. Next we describe the derivation of the exact expression of 
(r„) in two ways: we review the method using absolute probability and derive the result by an alternative method 
using conditional probability. The former method is comprehensive while the latter is more analytic and explicitly 
, illustrates the notion of ensemble average, i.e., the mean value of a macroscopic quantity calculated over all possible 
configurations of a system. Finally we calculate the mean volume which separates the reference point from its n-th 
neighbour and obtain an approximate expression of (r n ) from its radius. We find out how this approximation deviates 
\ from the exact expression of (r n ). 
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II. A HEURISTIC METHOD 

We begin with a heuristic approach to find the mean first neighbour distance. Consider a unit volume of the space 
described in the introduction, say, in the form of a hypersphere or a hypercube containing exactly N random points 
including the reference one. Let us divide this unit volume into N equal parts. Since the N random points are 
distributed uniformly over the unit volume each part is expected to contain just one of these. The mean distance (r\) 
between any point and its first neighbour is naively given by the linear extent of each part. Since the volume of each 
part is l/N, we expect 



(n(N))*[-) . (1) 
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We extend the above result for the first neighbour to the n-th neighbour by the following heuristic argument. We 
choose any one of the points as the reference and locate its n-th neighbour, n < N. The expected distance between 
them is (r n (N)). Keeping these two points fixed we change the number of points in the unit volume to Na by adding 
or removing points at random; the factor a is arbitrary to the extent that Na and na are natural numbers. Since the 
distribution of points is uniform, the hypersphere that had originally enclosed n points is now expected to contain 
na points. Therefore, what was originally the n-th neighbour of the reference point is now expected to be the na-th 
neighbour for which the expected distance from the reference point is now (r na (Na)). Since the two points under 
consideration are fixed, so is the distance between them. Consequently 



(r n (N)) » (r na (Na)) . (2) 

The above relation is approximate as the change in the density of points does not always convert the n-th neighbour 
of the reference point to exactly its na-th neighbour. Now we take a = l/n, so that 



(r - (A0) "( ri (£)) (3) 

which shows that the mean n-th neighbour distance for a set of N random points distributed uniformly is approximately 
given by the mean first neighbour distance for a depleted set of N/n random points in the same volume. The above 
relation is derived for such values of n that divide N exactly; however this approximate relation may be used for any 
value of n when N 3> n, by replacing N/n with the integer nearest to it. Using the expression of (ri(iV)) from Eq. 
we get 



r n (N)) « (-) (4) 



which, therefore, requires as a necessary condition that N 3> n. The results of Eq. and Eq. are extremely 
crude approximations; however these provide us with a rough picture of the mean n-th neighbour distance. 



III. THE METHOD OF ABSOLUTE PROBABILITY 

We now review the derivation of the exact expression of (r n ) by using the theory of absolute probability The 
method described is similar to that followed in Ref. Consider the system of random points described at the 
begining of the introduction. Assuming a certain random point as the reference there will be N — 1 other random 
points within a D-dimensional hypersphere of unit volume with the reference point at its center. For a given reference 
point the absolute probability of finding its n-th neighbour (n < N) at a distance between r n and r n + dr„ from it is 
given by the probability that out of the N — 1 random points (other than the reference point) distributed uniformly 
within the hypersphere of unit volume, exactly n — 1 points lie within a concentric hypersphere of radius r n and at 
least one of the remaining N — n points lie within the shell of internal radius r„ and thickness dr n : 

P(rn) dr n =(^:l) C- 1 £ ( N ~ n ) (1 - V nf~ n ~ q (W (5) 

where 



is the volume of the D-dimensional hypersphere of radius r„ centered at the reference point and 

N-l\_ (N-l)\ fN-n\_ (N-n)\ 



n-1 J (n- 1)! (N-n)V \ q J q\(N-n-q)\ 

are binomial coefficients. Ignoring differentials of order higher than the first (q = 1) in Eq. J5J we get: 
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P(r n ) dr n = ( ^_ i ) (N-n) V^ 1 (1 - V n ) N ~ n ^ 1 dV n . (8) 

Since the n-th neighbour (n < TV) must certainly lie within a unit volume centered at the reference point, its mean 
distance from the reference point is given by: 

(r») = / r n P{r n )dr n , (9) 
Jo 

where R is the radius of the D-dimensional hypersphere of unit volume: 

Changing the variable of integration in Eq. JHJ) from radius to volume (by the relation of Eq. ©) and using the 
probability distribution of Eq. ||HJ) we get the exact result for the mean n-th neighbour distance: 

MAO) = [r( \t /2 l)]1/g (^Ii 1 )(*-») f Q VZ H1/D) - 1 (i-v n f~ n - Uv n 
\r(% + i)] 1/D fN-i\ ,„ , i 



(N-n)B[n+—,N-n 



ttI/2 \ n-1 J y > \ 1 £>' 

,1/D 



[r(f +l)]^r(n + i) T(N) 



t: 1 / 2 r(n) r(/v + i) 



(ii) 



Here B{x,y) is the beta function defined as B(x,y) = t x 1 (1 — <) y J df and T(z) is the complete gamma function: 
T(z) = J °° i z_1 e _ *dt. These functions are related by the formula : B(x,y) — T(x) T(y)/T(x + y). 

For large values of TV we get by using Stirling's approximation 10] for the gamma function: T (TV + 1/D) /V (TV) ~ 
TV 1 / 15 ; therefore, for a large density TV, Eq. Ijlll) reduces to the following asymptotic form: 

If the neighbour index n is also large (but n < TV), we have T (n+ 1/D) /T (71) ~ n 1 / and the complete asymptotic 
expression of the mean n-th neighbour distance is given by: 



(r n (N)) ~ ( . (13 ) 

The above equation shows that the expression of (r n (TV)} obtained by heuristic means [Eq. (0}] has the correct 
asymptotic dependence on TV and n. 



IV. THE METHOD OF CONDITIONAL PROBABILITY 



Next we develop an alternative way of deriving the exact expression of (r n ) by using the theory of conditional 
probability We proceed by asserting that we look for the n-th neighbour of a reference point only after its first 
n — 1 neighbours have been located. In that case the reference point and its first n — 1 neighbours are considered as 
given; now the probablity V(r n )dr n of finding the n-th neighbour of the reference point at a distance between r n and 
r n + dr„ from it is a conditional probability as the n-th neighbour must certainly lie outside the hypersphere of radius 



r n -i- 
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q— 1 ^ ' 

Here is the volume of the D-dimensional hypersphere of radius r n centered at the reference point [Eq. JJjJ]. Ignoring 
differentials of order higher than the first (q = 1) in Eq. (|14|l we get: 



1 - 



V n - V n - 



1 - K-i 



IV — IL — 1/ 



dK 



1 - V n -i 



(14) 



P (r n ) dr„ 



1 - 



i - 



N—n-l 



(N - n) dV n 
1 - V n -i 



(15) 



For a given reference point and its first n — 1 neighbours the mean n-th neighbour distance is thus obtained as: 



^(conditional) = f R ^ v {rn) (16) 
J r n — i 

where, as before, i? is the radius of a D-dimensional hypersphere of unit volume. The quantity (r n ) ( condltIonal ) j g a 
function of the particular r n _i, ?*n- 2 , ■ ■ •■> T\ which are the distances of the first n — 1 neighbours of the reference 
point. To remove the dependence of the mean n-th neighbour distance on the particular set of values of the first n — 1 
neighbour distances the quantity ^ rn ^( condltlonal ) mus t be averaged successively over the probability distributions of 
each of the first n — 1 neighbours: 



(r„)= / dnP(ri) / dr 2 P(r 2 ) ••• [* dr„_ 2 7>(r„_ 2 ) / dr„_! P(r n _i) ( r „}( condi ") (17) 

where the probability distribution of the z-th neighbour distance is given by Eq. i|15|) with i replacing n. This step 
is equivalent to an ensemble average in statistical mechanics |llj] . After changing all the variables of integration in 
Eq. i|17|) from radii to the corresponding volumes (by the relation of Eq. @) and changing the order of the integrals 
such that the integral with respect to V n has to be evaluated last, we get: 



(r n (N)) = ^^T/!^ ' ( N - V( N - 2) • • • (iV - n) / dV n V^ D (1 - V n ) N ~ " 1 
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x dVx dV 2 ■■■ dV n - 2 / dV n -x (18) 

J0 JVi JV n -3 JV n -2 



which gives the final form of the mean n-th neighbour distance: 



mao> = [r( ^t /2 1)]1/ " (»:i) {N-n) J\r^- x (i-v n ) N - n - 1 

[r(g + i)] 1/p r(n + jQ T(N) 
ir 1 / 2 r(n) r(iv + i)' 

This is identical to the result of Eq. (jl 1|> obtained in the previous section. 



dV n 



(19) 



V. THE MEAN VOLUME ESTIMATE 



Instead of calculating the mean distance to the n-th neighbour we now calculate the mean volume (V n ) separating 
the reference point from its n-th neighbour. The volume separating a reference point from its n-th neighbour located 
at a distance r n from it is defined as the volume of the hypersphere of radius r n and centered at the reference point. 
Therefore, from Eq. we get: 
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r D/2 



Using the absolute probability distribution of Eq. JHJ) we get 



(20) 



(<)= / <P(r n )dr n 
Jo 

where R is defined in Eq. (|10|) . Consequently, we get from the above two equations: 



(21) 



(Vn) 



N _l J (N-n) I V: (1-V n ) 



n 

N - 1 
n-1 



N-n-l 



(N — n) B(n + 1,N — n) 



r(n+l) T(N) n_ 
T(n) Y{N + l) ~ N' 

Now we may estimate the mean n-th neighbour distance by using the following approximation: 



(22) 



(rn> « (r°) 1/D 



From Eqs. and J23 we get: 



..L>, v , I i ' _ t r ( 2 + 1 )] 



.1/2 



N, 



(23) 



(24) 



The above result is the same as the asymptotic expression of (r n ) obtained in Eq. (|13[1 and thus Eq.J33J) is valid as 
TV — > oo and n —> oo, n < N . Comparison with Eq. J3J shows that this approximation is a better estimate of (r n ) 
than the result obtained by heuristic means. 

The error in this estimate of (r n ) by Eq. (|23[) is given by (r^) 1 ^ — (r n ). From Eqs. and (|24|) we get: 



rV2 



[r (f + i)] 



l/D 



(r D n ) 1,D ~ <r„> 



n\V D r(n+-^) r(AQ 
TV/ 



r(n) r(iv + i)' 



(25) 



It is obvious that the error is zero for D = 1 and from the expression in Eq. I|25|) it is clear that the error is greater 
than zero for all finite dimensions D > 2. For large values of D we get: 



r(n) 



r i 



H n -i 



D 



(26) 



and a similar expression for T (iV + l/D) /T (N), where H n = X)fc=i V 5 are called harmonic numbers ^(j- Conse- 
quently Eq. 1251) reduces to the form: 



rV2 



[r + 1)] 



l/£» 



<^) 1/D -(r„) 



l/D 



1/Z3 



1 + t>Hn-i 



(27) 



Since lim n ^ 00 (i? ra — Inn) = 7 = lim^r^^ (i/jy — hiiV), where 7 = 0.5772156649 ... is the Euler's constant [13, for 
large values of n and ./V Eq. (|27|) may be written as: 
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rV2 



«> 1/D -w]«(^) 



1/D 



The error is then practically zero, since, for large values of D, we get: 



1 , / n 

1 + dHn 



(28) 



l/D 



exp 





f n \ 








kNJ_ 



(29) 



VI. CONCLUDING REMARKS 

In this article we have studied two kinds of approaches to determine the mean n-th neighbour distance in a system 
of uniformly distributed random points. In one kind of approach we construct the solution from a physical picture 
of the system; though it produces only an approximate mathematical result it helps to visualise the solution. The 
heuristic method of section 2 and the mean volume method of section 5 are of this kind. The other kind of approach 
is rigorous and produces the exact mathematical result : while the method of absolute probability used in section 
3 is largely pedagogical, the method of conditional probability used in section 4 provides a detailed insight into the 
problem. Though the heuristic estimate of (r n ) is close to the exact result only for large values of n, N and D, along 
with the condition N 3> n, the advantage of the heuristic method lies in its simplicity; however approximate, it gives 
an essence of (r n ). Therefore, in cases where the distribution of points in space is such that an exact evaluation of 
(r„) is not possible heuristic constructions similar to this one may be useful. 
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